Add health check to `watch.stream` for silent connection drops by Urvashi0109 · Pull Request #2525 · kubernetes-client/python

Urvashi0109 · 2026-03-18T11:42:07Z

What type of PR is this?

/kind bug
/kind feature

What this PR does / why we need it:

When running a watch on Kubernetes objects (e.g., Jobs, Pods, Namespaces) and the Kubernetes control plane gets upgraded, the watch connection is silently dropped. The watcher hangs indefinitely - No exception is raised and no new events are received. This is because the TCP connection enters a half-open state where the client believes the connection is still alive, but the server side has been torn down during the upgrade.

This PR adds a _health_check_interval parameter to watch.stream() that detects silent connection drops and automatically reconnects:

When _health_check_interval is set to a value > 0, a socket-level read timeout (_request_timeout) is configured on the HTTP connection
If no data arrives within the specified interval, urllib3 raises a ReadTimeoutError
The watch catches this exception and automatically reconnects using the last known resource_version, ensuring no events are missed
The feature is disabled by default (_health_check_interval=0), preserving full backward compatibility
When disabled, ReadTimeoutError propagates to the caller as before

Which issue(s) this PR fixes:

Fixes #2462

Special notes for your reviewer:

This PR takes approach: leveraging urllib3's existing read timeout mechanism (_request_timeout) to break out of the blocking read, then catching the resulting ReadTimeoutError/ProtocolError exceptions
The _ prefix on _health_check_interval follows the existing convention in this codebase (e.g., _preload_content, _request_timeout) for parameters that are consumed by the client library rather than passed to the API server
5 new unit tests added, all 24 tests (19 existing + 5 new) pass with zero regressions

Does this PR introduce a user-facing change?

Added `_health_check_interval` parameter to `watch.stream()` to detect and recover from silent connection drops during Kubernetes control plane upgrades. When set to a value > 0 (seconds), the watch will automatically reconnect if no data is received within the specified interval. Disabled by default for backward compatibility.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

k8s-ci-robot · 2026-03-18T11:42:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Urvashi0109
Once this PR has been reviewed and has the lgtm label, please assign yliaog for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

kubernetes/base/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

Adds an optional “health check” mechanism to Watch.stream() intended to detect silent watch connection drops (e.g., during control plane upgrades) by configuring timeouts and retrying from the last observed resource_version.

Changes:

Introduces _health_check_interval parameter in Watch.stream() and handles ReadTimeoutError/ProtocolError to trigger reconnects.
Auto-populates _request_timeout from _health_check_interval when not explicitly provided.
Adds unit tests covering reconnect behavior, default behavior, timeout propagation, and request-timeout argument handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
kubernetes/base/watch/watch.py	Adds `_health_check_interval` handling, sets timeouts, and retries on read/connection errors.
kubernetes/base/watch/watch_test.py	Adds tests validating reconnect + timeout configuration/compatibility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

kubernetes/base/watch/watch.py

+                # If health check is enabled, treat a read timeout as a
+                # silent connection drop and allow the outer while loop
+                # to reconnect using the last known resource_version.
+                if health_check_interval > 0:
+                    pass  # Fall through to retry logic below


kubernetes/base/watch/watch.py

+            except (ReadTimeoutError, ProtocolError) as e:
+                # If health check is enabled, treat a read timeout as a
+                # silent connection drop and allow the outer while loop
+                # to reconnect using the last known resource_version.
+                if health_check_interval > 0:
+                    pass  # Fall through to retry logic below
+                else:
+                    raise


kubernetes/base/watch/watch_test.py

+        # Verify _request_timeout was set to the health check interval
+        fake_api.get_namespaces.assert_called_once_with(
+            _preload_content=False, watch=True,
+            timeout_seconds=10, _request_timeout=30)


kubernetes/base/watch/watch_test.py

+        # Verify the user's _request_timeout (60) was preserved, not overridden
+        fake_api.get_namespaces.assert_called_once_with(
+            _preload_content=False, watch=True,
+            timeout_seconds=10, _request_timeout=60)


kubernetes/base/watch/watch.py

+        if health_check_interval > 0 and '_request_timeout' not in kwargs:
+            kwargs['_request_timeout'] = health_check_interval


Added health check to watch.stream for silent connection drops

8afcb14

k8s-ci-robot requested review from fabianvf and yliaog March 18, 2026 11:42

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 18, 2026

Urvashi0109 marked this pull request as ready for review March 18, 2026 11:43

Copilot AI review requested due to automatic review settings March 18, 2026 11:43

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2026

k8s-ci-robot requested a review from roycaihw March 18, 2026 11:43

Copilot started reviewing on behalf of Urvashi0109 March 18, 2026 11:44 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add health check to `watch.stream` for silent connection drops#2525

Add health check to `watch.stream` for silent connection drops#2525
Urvashi0109 wants to merge 1 commit intokubernetes-client:masterfrom
Urvashi0109:Fix-Watch-Health-Check

Urvashi0109 commented Mar 18, 2026

Uh oh!

k8s-ci-robot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if health_check_interval > 0 and '_request_timeout' not in kwargs:
		kwargs['_request_timeout'] = health_check_interval

Conversation

Urvashi0109 commented Mar 18, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants